Below we summarize the main R functions that are used in the SNA4DS course. We will not explain the underlying concepts here, but refer you to the lectures, labs, and slides of the course for that.

The aim of this “cheatsheet” is that it provides you with an overview of the main functions you will need throughout the course. We hope that it can provide a useful reference for you, as you develop and apply your network analysis skills.

NOTE:

Most functions have multiple arguments. Our aim is not to show and discuss the various arguments that exist, because that would yield an unwieldy and very long document. Rather, we recommend you use your R skills and use the help function ? and help and other approaches we teach you in this course to learn about the details of a specific function. If you still can’t figure it out, contact us and we’ll assist you.



1 General SNA4DS functions

1.1 SNA4DS::check_SNA4DS()

This function checks if a newer version of the SNA4DS package is available on Github. It also offers you to install the new version if one is available.

1.2 SNA4DS::SNA4DS_tutorials()

This function lists the currently available tutorials in your installed version of SNA4DS and allows you to pick the one you want to run from a list.

1.3 SNA4DS::make_matrix_from_vertex_attribute

This function turns a vertex attribute into a matrix with a value for each edge in a graph.

The input is a graph object of class igraph or network, the name (name) of the required attribute, and the name of the function (measure) you want to perform on the attribute.

For example,

SNA4DS::make_matrix_from_vertex_attribute(g, 
                                  name = "vertexAttributeName",
                                  measure = "max")

creates a matrix where cell (i, j) represents the maximum value for dyad (i, j) of their attribute called “vertexAttributeName”.

The diag argument determines what is put on the diagonal of the matrix. The default is to fill the diagonal with 0’s, but you can override this.

Beware if missing values occur in the chosen vertex attribute, currently no check is made for missing values and you need to check the corresponding cells of the outcome matrix if it did what you wanted.

You can also just provide the function with a numeric vector and it will calculate the matrix for you based on that alone. Of course, in this case you have to make sure yourself that the values in the vector are in the same order as in the network. Here is an example where a vector with values c(1, 2, 3, 4, 5) is turned into a matrix with the absolute difference values in the cells.

SAN4DS::make_matrix_from_vertex_attribute(1:5, measure = "absdiff")

Similarly, a sender effect would be constructed by:

SAN4DS::make_matrix_from_vertex_attribute(1:5, measure = "sender")

Currently, the function implements:

  • absdiff: the absolute difference between values for two vertices
  • diff: the value of the vertex with the lower vertex index minus the value of the vertex with the higher vertex index.
  • sum: the sum of the values for both vertices
  • max: the highest value between the two vertices
  • min: the highest value between the two vertices
  • mean: the mean between the two vertices
  • sender: the value of the sender’s attribute in the entire row
  • receiver: the value of the receiver’s attribute in the entire column
  • equal: 1 if both vertices have the same value on the attribute, 0 otherwise



2 Generating and manipulation graph objects

2.1 Overview of igraph and igraph functions

There are two main packages for basic graph generation and manipulation: the igraph package and the statnet package. Actually, statnet is a suite of packages that work together. In this course, we will will make use of several packages from the statnet suite.

The igraph package creates a graph object of type igraph. The statnet suite creates a graph object of type network. There are many things you can do in both packages. Both packages can generate graphs and do basic manipulation, so here you should just use the package whose API you like best. The igraph package provides more mathematical functions to apply to the graph data and the statnet suite provides loads of statistical models that the igraph package does not do.

Below, we provide you with an overview of the functions that do basic data generation and manipulation of graph datasets. We show the functions in both packages that do equivalent things.


Network construction and manipulation
igraph network
CREATE
generate an empty network
  igraph::make_empty_graph(
    n = 0, directed = TRUE
  )
  network::network.initialize(
    n,
    directed = TRUE,
    hyper = FALSE,
    loops = FALSE,
    multiple = FALSE,
    bipartite = FALSE
  )
generate an ring network
  igraph::make_ring(10)
generate an star network
  igraph::make_star(10)
Create the object from input1
  # from dataframe
  igraph::graph_from_data_frame(df)

  # from adjacency matrix
  igraph::graph_from_adjacency_matrix(adjmat)

  # from edgelist
  igraph::graph_from_edgelist(edgelist)
  # from dataframe
  network::network(df)

  # from adjacency matrix
  network::network(adjmat)

  # from edgelist
  network::network(edgelist)

  # general
  network::network(
    x,
    vertex.attr = NULL,
    vertex.attrnames = NULL,
    directed = TRUE,
    hyper = FALSE,
    loops = FALSE,
    multiple = FALSE,
    bipartite = FALSE,
    ...
  )
CREATE RANDOM
random graph with given density2
  # 10 vertices, density on average .3
  igraph::sample_gnp(n = 10, p = .3,
      directed = TRUE)

  # 10 vertices, 27 edges (ie. density = .3)
  igraph::sample_gnm(n = 10, m = 27,
      directed = TRUE)
  # 10 vertices, density on average .3
  sna::rgraph(n = 10, m = 1, tprob = 0.30
      mode = 'digraph')

  # 10 vertices, 27 edges (ie. density = .3)
  sna::rgnm(1, 10, m = 27)
random graph with given dyad census2
  # 10 vertices, probability per
  # dyad type
  sna::rguman(n = 1, nv = 10,
      mut = 0.25, asym = 0.5, null = 0.25,
      method = 'probability')

  # 10 vertices, exact dyad census
  sna::rguman(n = 1, nv = 10,
      mut = 8, asym = 25, null = 12,
      method = 'exact')
randomly permute the order of the vertices3
  igraph::permute(g, sample(igraph::vcount(g)))
  sna::rmperm(g)
INSPECT
number of vertices
  igraph::vcount(g)
  network::network.size(g)
number of edges
  igraph::ecount(g)

  igraph::gsize(g)
  network::network.edgecount(g)
access the vertices
  igraph::V(g)
access the edges
  igraph::E(g)
mixing matrix4
  network::mixingmatrix(g, vertexAttributeName)
EXTRACT
access a graph attribute5
  igraph::list.graph.attributes(g)

  igraph::get.graph.attribute(g, 'attributename')

  g$attributename
  network::list.network.attributes(g)

  network::get.network.attribute(g, 'attributename')
access a vertex attribute5
  igraph::list.vertex.attributes(g)

  igraph::get.vertex.attribute(g, "attributename")

  igraph::V(g)$attributename
  network::list.vertex.attributes(g)

  network::get.vertex.attribute(g, "attributename")
access vertex names5,6
  igraph::V(g)$name

  igraph::get.vertex.attribute(g, 'name')
  network::network.vertex.names(g)

  network::get.vertex.attribute(g, 'vertex.names')
access an edge attribute5
  igraph::list.edge.attributes(g)

  igraph::get.edge.attribute(g, 'attributename')

  igraph::E(g)$attributename
  network::list.edge.attributes(g)

  network::get.edge.attribute(g, 'attributename')
set attributes5
  igraph::set.graph.attribute(g, 'name', value)

  g$name <- value

  igraph::set.vertex.attribute(g, 'name', value)

  igraph::V(g)$name <- value

  igraph::set.edge.attribute(g, 'name', value)

  igraph::E(g)$name <- value
  network::set.network.attribute(g, 'name', value)

  network::set.vertex.attribute(g, 'name', value,
        v = seq_len(network::network.size(g)))

  network::set.edge.attribute(g, 'name', value,
        e = seq_along(g$mel))

  network::set.edge.value(g, 'name', value,
        e = seq_along(g$mel))
get a vertex' neighbors
  igraph::neighbors(g, 'Jane', mode = 'out')

  # all options
  igraph::neighbors(graph, v,
    mode = c('out', 'in', 'all', 'total'))
  network::get.neighborhood(g, 1, 'out')

  # all options
  network::get.neighborhood(x, v,
    type = c('out', 'in', 'combined'),
    na.omit = TRUE
  )
get a vertex neighborhood7
  igraph::make_ego_graph(g, order = 1,
    nodes = "Jane", mode = "all")

  # all options
  igraph::make_ego_graph(
    graph,
    order = 1,
    nodes = V(graph),
    mode = c("all", "out", "in"),
    mindist = 0
  )
  sna::ego.extract(dat,
    ego = NULL,
    neighborhood = c("combined", "in", "out"))

  sna::neighborhood(dat, order,
    neighborhood.type = c("in", "out", "total"),
    mode = "digraph", diag = FALSE, thresh = 0,
    return.all = FALSE,
    partial = TRUE)
extract a subset from the graph
  # subset based on vertices
  igraph::induced_subgraph(g,
      vids = theVerticesYouWantToKeep)

  # subset based on edges
  igraph::subgraph.edges(g,
      eids = theEdgesYouWantToKeep)
  # subset based on vertices
  network::get.inducedSubgraph(g,
      v = theVerticesYouWantToKeep)

  # subset based on edges
  network::get.inducedSubgraph(g,
      eid = theEdgesYouWantToKeep)
CONVERT
make adjacency matrix
  igraph::as_adjacency_matrix(g, sparse = FALSE)
  network::as.sociomatrix(g)

  network::as.matrix.network(flomar_network)
make edgelist8
  igraph::as_edgelist(g)

  igraph::as_data_frame(g)
  network::as.data.frame.network(g)

  network::as.edgelist(g)

  sna::as.edgelist.sna(g)
make adjacency list
  igraph::as_adj_list(g)
make the network directed
  igraph::as.directed(g)
make the network undirected
  igraph::as.undirected(g)

  # for example
  igraph::as.undirected(g, mode = 'collapse',
      edge.attrib.comb = list(weight = 'sum'))
  sna::symmetrize(g)

  # for example
  sna::symmetrize(g, rule = 'strong')
remove loops and multiple edges
  igraph::simplify(g)

  # for example
  igraph::simplify(g, remove.multiple = TRUE,
      remove.loops = TRUE,
      edge.attrib.comb = list(weight = 'max'))
project a bipartite graph
  igraph::bipartite.projection(g)
convert to a line graph
  igraph::make_line_graph(g)

1 `network` uses a single function for most input types

2 Useful for a manual CUG test

3 Useful for a manual QAP test

4 Generates a mixing matrix of a vertex attribute

5 R is case sensitive!

6 Assuming the names are in `name` or `vertex.names` (default)

7 These functions serve equivalent purposes, but yield quite different kinds of outputs

8 The `sna` function includes edge weights


2.2 SNA4DS functions

The SNA4DS package offers a few functions that assist with the manipulation of graph data in R.

  • SNA4DS::makeEdgelist(names = NULL, attribute = NULL)

The input is a data.frame (names) with edge information. The attribute is a vector that contains a node attribut for those vertices.

The function returns a vector or data.frame that can be read into igraph or network.

  • SNA4DS::makeNodelist(names = NULL, attribute = NULL)

The input is a data.frame (names) with edge information. The attribute is another data.frame that contains the values of those edges.

The function returns an edgelist that can be read into igraph or network.

  • SNA4DS::extract_all_vertex_attributes(g)

The SNA4DSextract_all_vertex_attributes(g) function extracts all vertex attributes from a graph object and puts them together into a data.frame. The function works with both igraph and network class objects.




3 intergraph: moving between igraph and network

There are two ways to convert a graph object between the two classes. The first is to convert the object into another representation and import that into the other package. For example, one could first convert an igraph object into an adjacency matrix and read that in in network.

A straightforward way to convert between the two classes is to use the intergraph package. This is the sole purpose of this package.

You coerce a network object into a igraph object as follows:

intergraph::asIgraph(g)

You coerce a igraph object into a network object as follows:

intergraph::asNetwork(g)

The intergraph package also has a useful function that turns a network or igraph object into a data.frame:

intergraph::asDF(g)

If you have a network in both igraph and network versions, you can check if they are (nearly) the same through the following intergraph function:

# perform the test, result is TRUE or FALSE
intergraph::netcompare(network1, network2, test = TRUE)

# return the results of all the tests
intergraph::netcompare(network1, network2, test = FALSE)




4 Transforming data into igraph and network classes

Very often the network data you want to manipulate with igraph and network does not come in the right format and you need to create the object. Below you can find the most popular functions that help you to do so.

4.1 igraph

  • Load a graph with attributes
nodes <- utils::read.csv("file-NODES.csv", header = T, as.is = T)
links <- utils::read.csv("file-EDGES.csv", header = T, as.is = T)

net <- igraph::graph_from_data_frame(d=links, vertices=nodes, directed = T) 
  • Make a graph from an adjacency matrix
net <- igraph::graph_from_adjacency_matrix(adj_mat)
  • Make a graph from an edge list
net <- igraph::graph_from_edgelist(edgelist)
  • Add node attributes
net <- igraph::set_vertex_attr(net, 'attr_name', value = c(...))
  • Add edge attribute
net <- igraph::set_edge_attr(net, 'attr_name', value = c(...))

4.2 network

  • Make network from adjacency matrix
net <- as.matrix.network(adj_mat)
  • Make network from edge list
net <- network::as.network(edgelist, matrix.type="edgelist")
  • Add node attributes
net <- network::set.vertex.attribute(net, 'attr_name', value = c(...))
  • Add edge attribute
net <- network::set.edge.attribute(net, 'attr_name', value = c(...))




5 Computing measures

Below you will find a table with the main measures that are covered in the course. When both igraph and network provide a function for the measure, you will find both of them in the table.


Measures at the level of the graph, dyads, and vertices
igraph network
GRAPH LEVEL
density
igraph::edge_density(g)
# preferable for biprartite graphs
network::network_density(g)

# preferable for valued graphs
sna::gden(g)
dyad census
igraph::dyad_census(g)
sna::dyad.census(g)
triad census
igraph::triad_census(g)
sna::triad.census(g,
    mode = 'digraph')
degree assortativity
igraph::assortativity_degree(g,
directed = TRUE)
mean distance
igraph::mean_distance(g,
    directed = TRUE,
    unconnected = TRUE)
diameter
igraph::diameter(g,
    directed = TRUE,
    unconnected = TRUE)

# what is the vertex pair with
# the longest geodesic
igraph::farthest_vertices(g)
centralization1
# general function, just include
# any vertex centrality scores
igraph::centralize(scores,
      theoretical.max = 0,
      normalized = TRUE)
# general function, include any function
# that calculates vertex centralities
sna::centralization(g, FUN,
    mode = 'digraph', normalize=TRUE, ...)
specific centralization functions2
# betweenness centralization
igraph::centr_betw(g,
    directed = TRUE)$centralization

# closeness centralization
igraph::centr_clo(g,
    mode = 'out',
    normalized = FALSE)$centralization

# degree centralization
igraph::centr_degree(g, mode = 'all')

# eigenvector centralization
igraph::centr_eigen(
everglades,
directed = TRUE)$centralization
reciprocity
igraph::reciprocity(g)
sna::grecip(g,
    measure = 'edgewise')
correlation between two graphs
sna::gcor(g1, g2, mode = 'graph')
transitivity3
igraph::transitivity(g,
    type = 'global')
sna::gtrans(g, mode = 'digraph',
  measure = 'weak',
  use.adjacency = TRUE)
correlation between two graphs
sna::gcor(g1, g2, mode = 'graph')
VERTEX LEVEL
degree
igraph::degree(g,
    mode = 'in')
sna::degree(g, gmode = 'digraph',
    cmode = 'indegree')
betweenness
igraph::betweenness(g,
    directed = TRUE)
sna::betweenness(g, gmode = 'digraph',
    cmode = 'directed')
flow betweenness
sna::flowbet(g, gmode = 'digraph',
    cmode = 'rawflow')
Bonacich power centrality
igraph::power_centrality(g)
sna::bonpow(g, gmode = 'digraph')
closeness centrality4
igraph::closeness(g,
    mode = 'all')
sna::closeness(g, gmode = 'digraph',
    cmode = 'directed')
stress centrality
sna::stresscent(g, gmode = 'digraph'
    cmode = 'directed')
eccentricity
igraph::eccentricity(g,
    mode = 'all')
eigenvector centrality
igraph::eigen_centrality(g,
    directed = TRUE,
    scale = FALSE)$vector
sna::evcent(g,
    gmode = 'digraph',
    rescale=FALSE)
eigenvector centrality
igraph::eigen_centrality(g,
    directed = TRUE,
    scale = FALSE)$vector
sna::evcent(g,
    gmode = 'digraph',
    rescale=FALSE)
DYAD LEVEL
shortest path for a given set of vertics
igraph::all_shortest_paths(g,
    from = IDofVertex,
    to = igraph::V(g),
    mode = 'out')
geodesic lengths5
igraph::distances(g,
    mode = 'out')
sna::geodist(g,
    count.paths = FALSE)

# if a count of geodesics is required
sna::geodist(g,
    count.paths = TRUE)$counts
edge betweenness
igraph::edge.betweenness(g,
    directed = FALSE)

1 These functions can calculate centralization of any vertex-level measure

2 `$res` or `$vector` return the centrality scores

3 Results depend quite a bit on the algorithm used

4 Make sure you pick the value for the arguments with care

5 Output is a table with an entry per vertex pair




6 Communities and other subgroups

Networks often represent complex structures that are not uniformly connected. Very often we can observe sub-groups and communities.

6.1 Manually subset graphs

You might want to separate a subgraph from the rest of the network with the function induced_subgraph you can do it calling the nodes by label or by number

sub <- igraph::induced_subgraph(net, c('s01','s02'))

sub <- igraph::induced_subgraph(media_net, 1:7)

6.2 Community detection

There are several algorithms implemented in r that allow the identification of communities inside networks.

6.2.1 Walktrap algorithm

You can determine community structure via short random walks using the walktrap algorithm, as implemented in igraph::walktrap.community.

You run this analysis as follows:

igraph::cluster_walktrap(g)

You can adjust some of the settings, but the default setting almost always works well. A general analysis approach works as follows:

# run the algorithm
walk <- igraph::cluster_walktrap(g)

# get an overview of the results
print(walk)

# get the modularity score
igraph::modularity(walk)

# who is member of which community
igraph::communities(walk)

# which community is a vertex member of
igraph::membership(walk)

# number of communities
length(walk)

# size of each community
igraph::sizes(walk)

# which edge connects multiple communities
igraph::crossing(walk, g)

# plot the network, highlighting the communities
plot(walk, g)

I you are so inclined, you can plot the community division as a dendrogram, as follows:

stats::as.hclust(walk) %>% plot()

6.2.2 Girvan Newman Algorithm

The Girvan Newman Algorithm is based on the betweenness centrality. The edge betweenness score of an edge measures the number of shortest paths through it, see edge_betweenness for details. The idea of the edge betweenness based community structure detection is that it is likely that edges connecting separate modules have high edge betweenness as all the shortest paths from one module to another must traverse through them. So if we gradually remove the edge with the highest edge betweenness score we will get a hierarchical map, a rooted tree, called a dendrogram of the graph. The leafs of the tree are the individual vertices and the root of the tree represents the whole graph.

cluster_edge_betweenness performs this algorithm by calculating the edge betweenness of the graph, removing the edge with the highest edge betweenness score, then recalculating edge betweenness of the edges and again removing the one with the highest score, etc.

ng <- igraph::cluster_edge_betweenness(net)

ng

Even if this algorithm handles directed networks, the modularity is computed with the undirected version only.

igraph::modularity(ng)

The clusters can be plotted as a dendrogram

igraph::plot_dendrogram(ng)

6.2.3 Louvain Algorithm

The function cluster_louvain implements the multi-level modularity optimization algorithm for finding community structure. It is based on the modularity measure and a hierarchial approach.

It can be used only on undirected graphs.

cl <- cluster_louvain(igraph::as.undirected(net))

cl

extract the modularity from the assigned variable

cl$modularity

Check to which group each node belongs to

data.frame(rbind(cl$names, cl$membership))

Plot the network with clusters

plot(cl, net, vertex.label = NA, vertex.size=5, edge.arrow.size = .2)




7 Plotting

7.1 Basic plotting in igraph

The plot function alone already plots nodes and edges with default options. More sophisticated specifications need to be manually set. It works with networks of class igraph.

plot(net, 
     edge.arrow.size = .2,                # edge and arrow size
     edge.color = "red",                  # edge color
     vertex.color = "blue",               # vertex filling color
     vertex.frame.color = "green",        # vertex perimeter color
     vertex.label = igraph::V(net)$label, # vertex labels
     vertex.label.cex = 0.6,              # vertex label size
     vertex.label.color = "black")        # vertex label color

7.2 Basic plotting in network

The gplot function alone already plots nodes and edges with default options. More sophisticated specifications need to be manually set. It works with networks of class network.

gplot(net,
      arrowhead.cex = 0.2,     # edge and arrow size
      edge.col = 'red',        # edge color
      vertex.col = 'blue',     # vertex filling color
      vertex.border = 'green', # vertex perimeter color
      displaylabels = TRUE,    # vertex labels
      label.cex = 0.6,         # vertex label size
      label.col = 'black')     # vertex label color

7.3 Basic plotting in ggraph

The ggraph function alone does not plot any data. Nodes, edges, and their attributes need to be specified layer after layer. It works both with networks of class network and igraph. It can be fully customized using the ggplot2 toolkit.

ggraph::ggraph(net) +
  # put edges in the plot and make them red with an arrow
  ggraph::geom_edge_fan(color = "red", arrow = grid::arrow(length = grid::unit(4, 'mm'))) +
  # put vertexes in the plot and make them blue with size 5
  ggraph::geom_node_point(color = "blue", size = 5) +   
  # plot labels in black and size 5
  ggraph::geom_node_text(ggplot2::aes(label = media), size = 5, color = "black", repel = T) +
  # set background features
  ggplot2::theme_void() 

7.4 SNA4DS functions

The SNA4DS package contains a function to plot centrality scores of the vertices. The function and its options are specified as follows:

SNA4DS::centralityChart(
  net,
  measures = c("betweenness", "closeness", "degree"),
  directed = igraph::is.directed(net),
  mode = c("all", "out", "in", "total"),
  normalized = TRUE,
  path = FALSE
)

The function takes an object of class igraph and plots three centrality scores, so you can visually compare them. Make sure to pick the required value for mode (the default is “all”). You can leave path to FALSE, which will always work. If you want the dots to be connected (which can yield a more insightful plot), set path = TRUE, you then get a path plot. In some cases this yields a messy or messed-up plot, so then set path = FALSE again.




8 Statistical models

8.1 Overview table

Here is an overview of the statistical models discussed in the course.


Statistical network models
When Which approach Function
Dependent vertex attribute explained by a network weight matrix and a matrix of covariates Network autocorrelation model
sna::lnam
Statistic on a single network Conditional Uniform Graph test
sna::cug.test
Association between two networks QAP
sna::qaptest
A valued dependent network explained by one or more explanatory networks QAP linear model
sna::netlm
A binary dependent network explained by one or more explanatory networks QAP logistic model
sna::netlogit
A binary or valued dependent network explained by a set of endogenous and exogenous variables Exponential random graph models
ergm::ergm


8.2 Network autocorrelation models

The network autocorrelation model is run through the sna::lnam function. The basic function call is as follows:

sna::lnam(y, x = NULL, W1 = NULL, W2 = NULL)

Here,

  • y is a vector with a value for each vertex. The implementation in sna::lnam is only appropriate for continuous dependent variables.

  • W is a matrix of the same dimension as the network, containing the weights that drive the network influence process. You need to specify W1 and can include a second weight matrix W2 if you want.

  • x is a matrix with a row per vertex. Make sure to include a column with 1’s, so an intercept is included. Make sure to include column names, so you get informative output.

There is a useful summary method (that shows you an overview of the results) and a plot method (that you use to check model assumptions).


8.3 Conditional Uniform graphs (CUG)

There are two methods to perform a conditional Uniform graph test.

The first is to generate the graphs manually and calculate the measures on each graph. Generation of these graphs can be done using igraph::sample_gnm (which conditions on size and density), igraph::sample_gnp (another way to condition on size and density). The equivalent functions in sna are sna::rgraph and sna::rgnm. See the data generation table for these functions.

The second approach is to use a function that does the graph generation and computes the network measure for you. The preferred is sna::cugtest, which is specified as follows:

sna::cug.test(g, FUN, mode = c("digraph", "graph"), cmode = c("size", 
    "edges", "dyad.census"), reps = 1000, 
    ignore.eval = TRUE, FUN.args = list())

See the sna help function for details.

Here

  • FUN is the function that needs to be calculated on each graph

  • FUN.args contains any arguments that are required for the function you specified in FUN

  • cmode determines the type of graphs that are drawn (ie. what you condition on). The options are

    • “size”: this generates graphs with a particular size and density 0.5. You rarely want this.

    • “edges”: this conditions on a specific edge count (or an exact edge value distribution)

    • “dyad.census”: this conditions on a dyad census (or dyad value distribution)

For example, in order to test whether the transitivity in your graph g is exceptional for a network of the same size and density as in g, you would run

sna::cug.test(g, sna::gtrans, cmode = "edges")

It is wise to always explicitly tell the function whether your graph is directed or not, so a better way to specify the previous function is

sna::cug.test(g, mode = "graph", FUN = sna::gtrans, 
              cmode = "edges", reps = 1000, 
              FUN.args = list(mode = "graph"))

Testing the betweenness centralization of you network g could be performed as follows, again conditioning on size and density:

sna::cug.test(g,
              sna::centralization,
              FUN.arg=list(FUN = sna::betweenness), 
              mode="graph", 
              cmode="edges")

There is also a useful plot method for the result of the CUG test.


8.4 QAP test

There are two methods to perform a QAP test.

The first is to manually permute the graph. Generation of these graphs can be done using igraph::permute or sna::rmperm. See the data generation table for these functions.

The second approach is to use a function that does the graph permutation and computes the required measure (typically a correlation) for you. The preferred is sna::qaptest, which is specified as follows:

sna::qaptest(g, FUN, reps = 1000, ...)

See the sna help function for details.

Here

  • FUN is the function that needs to be calculated after each permutation

  • ... contains any arguments that are required for the function you specified in FUN

Typically, you want to test the correlation between two graphs, as follows:

sna::qaptest(list(firstNetwork, secondNetwork), 
             FUN = sna::gcor, reps = 1000,
             g1 = 1, g2 = 2)

There is a useful summary method and a plot method for the output of the function.


8.5 QAP linear regression

QAP linear regression is performed through the sna::netlm function. The function looks as follows:

sna::netlm(y, x, intercept = TRUE, mode = "digraph", 
    nullhyp = "qapspp", reps = 1000)

Make sure to always set intercept = TRUE and nullhyp = "qapspp". For small networks, 1000 replications should be enough, for larger networks you should typically use a higher number (say, 2000).

As an example, this is how you specify a model where graph g is modeled as a linear function of graphs g1, g2, and g3.

mod <- sna::netlm(y = g, x = list(g1, g2, g3), intercept = TRUE,
                              nullhyp = 'qapspp', reps = 1001)
mod$names <- c("Intcpt", "Net1", "Net2", "Net3")
summary(mod)

It is wise to add the names of the networks to the output object, like you see above. That is not strictly necessary, but it makes the output of the function easier to read.


8.6 QAP logistic regression

QAP logistic regression is performed through the sna::netlogit function. The function looks as follows:

sna::netlogit(y, x, intercept = TRUE, mode = "digraph", 
    nullhyp = "qapspp", reps = 1000)

Make sure to always set intercept = TRUE and nullhyp = "qapspp". For small networks, 1000 replications should be enough, for larger networks you should typically use a higher number (say, 2000).

As an example, this is how you specify a model where binary graph g is modeled as a function of graphs g1, g2, and g3.

mod <- sna::netlogit(g, list(g1, g2, g3), 
                     intercept = TRUE,
                     nullhyp = "qapspp", reps = 1001)
mod$names <- c("Intcpt", "Net1", "Net2", "Net3")
summary(mod)




8.7 Exponential Random Graph Model (ERGM)

An ERGM model is performed through the ergm::ergm function. The basic function call is as follows:

fit <- ergm::ergm(formula)

The formula requires the specification of a network dependent variable, and a list of terms.

Terms can be classified in three main ways.

  • Dyadic independent and dyadic dependent terms: We encounter the first one when the probability of edge formation is related to nodes properties or attributes; we encounter the second when the probability of edge formation depends on other existing edges.

  • Structural and nodal attributes terms: The first kind provides tools to understand the structure of the network per se; the second kind provides tools to explain how nodal attributes might have influenced the formation of edges.

  • Terms for directed networks and term for undirected networks

8.7.4 Terms specifications

Use the argument levels within the term specification for selecting the baseline or reference category.

Example: set female as a reference category.

fit <- ergm::ergm(Net ~ edges + nodefactor('sex', levels = -(2)))

8.7.5 Searching for terms

You can look for additional terms with

search.ergmTerms(keyword, net, categories, name)

You have four arguments to help you finding terms:

  • keyword optional character keyword to search for in the text of the term descriptions. Only matching terms will be returned. Matching is case insensitive.

  • net a network object that the term would be applied to, used as template to determine directedness, bipartite, etc

  • categories optional character vector of category tags to use to restrict the results (i.e. ‘curved’, ‘triad-related’) –see categorization of terms in the manual

  • name optional character name of a specific term to return

8.7.6 Checking your data before the analysis

Before you run any exponential random graph model you must know your data by heart. Not only using descriptive network statistics, but also checking model specifications, before hitting the run button.

  • Manually check the attribute(s) (numeric, integer, categorical, ordinal)
table(network::get.vertex.attribute(Net, 'sex'))
  • check mixing of categorical attributes
network::mixingmatrix(Net, "sex")
  • check model statistics.
summary(Net ~ edges + nodefactor('sex'))

This last one provides the number of observed cases under the assumptions of each term.

8.7.7 Reading results

You interpret ERGM results as logit models results. Two options:

  • Compute odd ratio for each coefficient
OR <- exp(coef)
  • Compute probability for each coefficient
P <- exp(coef) / (1 + exp(coef))

8.7.8 Simulating networks

It is sometimes helpful to simulate networks with the same features at the one you observed in real life.

  • Simulating a network from a model
fit <- ergm::ergm(Net ~ edges)
simfit <- simulate(fit, burnin = 1e+6, verbose = TRUE, seed = 9)
  • simulate network fixing the coefficient results

RandomNet <- network::network(16,density=0.1,directed=FALSE)

sim <- simulate(~ edges + kstar(2), nsim = 2, coef = c(-1.8, 0.03),
                  basis = RandomNet, 
                  control = ergm::control.simulate(
                    MCMC.burnin=1000,
                    MCMC.interval=100))
sim[[1]]

8.7.9 MCMC Diagostics

You can check the Monte Carlo Markov Chains diagnostic for your dyadic dependent model using the function:

ergm::mcmc.diagnostics(fit)

8.7.10 Goodness of Fit

You can check the goodness of fit of your model using the function

ergm::gof(fit)

You can also plot your gof output

plot(ergm::gof(fit))




8.8 ERGM for temporal networks

What How to store
Time-varying dyadic covariates Either as a list of networks or matrices
Constant dyadic covariates Single network or matrix
Node level attributes As vertex attributes inside the observed network objects

jbkjbkjbkjbkjbkjb

lkjlkjlkjlkjlkj

Temporal effects for the ERGM
meaning btergm
memory
Positive autoregression Previous existing edges persist in a next network
 btergm::memory(type = "autoregression", lag = 1)
Dyadic stability Both previous existing and non-existing ties are carried over to the current network
 btergm::memory(type = "stability", lag = 1)
Edge innovation A non-existing previous tie becomes existent in the current network
 btergm::memory(type = "innovation", lag = 1)
Edge loss An existing previous tie is dissolved in the current network
 btergm::memory(type = "loss", lag = 1)
delayed reciprocity
reciprocity if node j is tied to node i at t = 1, does this lead to a reciprocation of that tie back from i to j at t = 2?
 btergm::delrecip(mutuality = FALSE, lag = 1)
mutuality if node j is tied to node i at t = 1, does this lead to a reciprocation of that tie back from i to j at t = 2 AND if i is not tied to j at t = 1, will this lead to j not being tied to i at t = 2? This captures a trend away from asymmetry.
 btergm::delrecip(mutuality = TRUE, lag = 1)
time covariates
time effect per se Test for a specific trend (linear or non-linear) for edge formation
 btergm::timecov(transform = function(t) t)
Time effect of a covariate Interaction effect to test whether the importance of a covariate increases or decreases over time
 btergm::timecov(x, transform = function(t) t)

jnkjnkjnkjnkjn

gof + gofplot

9 Temporal networks (exploration and description)

The main packages to use in this course for descriptive and exploratory analysis of temporal networks are networkDynamic to construct and manipulate temporal networks), tsna (for sna-like network measures), and ndtv (for visualization).

Edges will typically have a starting time (onset), and end time (terminus), a duration, a sender (tail), and a receiver (head). of course, edges can start and end multiple times during the observation period and can have durations of length 0 up until any positive number.

The temporal networks are of class networkDynamic.

9.1 Network generation and manipulation

  • networkDynamic::networkDynamic: construction of a temporal network. There are many ways in which you can construct a temporal network. A common way is to first construct a network that has the vertex names, any vertex static attributes, edge attributes, whether the network is directed, et cetera.
    This network is called base.net and is used by this function to extract the basic aspects of the network. Don’t worry that some values (e.g., vertex attributes) may change over time, because any temporal info you add to this function will override what is in base.net. But base.net is an excellent and efficient way to provide much data to the function about the temporal network and it more cumbersome to add that later on.
    Further, you can provide dynamic data through data.frames for vertices and for edges in several ways. Consult the help function for the details, as this vignette would become far too long otherwise.

  • as.data.frame(g) Extract the dynamic edge info from the network, as a data.frame.

Most of the functions below allow you to specify a time segment you are interested in. Typically, these include onset, terminus, length, and at. Below, we give only one example of how each function can be specified.

  • networkDynamic::list.vertex.attributes.active(g, onset = 5, terminus = 8) List the attributes of the vertices that are active in a specific time segment.

  • networkDynamic::get.vertex.attribute.active(g, "attrName", at = 1) The value for vertex attribute attrName in a specific time segment.

  • networkDynamic::list.edge.attributes.active(g, onset = 0, terminus = 49) List the attributes of the edge that are active in a specific time segment.

  • networkDynamic::get.edge.attribute.active(g, "attrName", at = 1) The value for edge attribute attrName in a specific time segment.

  • networkDynamic::network.extract(classroom, onset = 0, terminus = 1) Extract the part of the temporal network for a specific time segment.

  • networkDynamic::network.collapse(classroom, onset = 0, terminus = 1) Collapse the temporal network into a static network based on the activity within a specific time segment.

  • networkDynamic::activate.vertex.attribute, networkDynamic::activate.edge.attribute, activate.edge.value, activate.network.attribute Set or modify attributes within a specific time segment.

  • deactivate.vertex.attribute, deactivate.edge.attribute, deactivate.network.attribute Make an attribute inactive during a specific time segment.

NOTE: The functions above for accessing and setting the attributes of a networkDynamic object are not very user friendly. Luckily, you can also access and/or set attributes using the network package like in the network manipulation table. As long as you want to access and/or set attributes that are static, this works much easier and uses functions that you have used multiple times already in this course and should be second nature to you by now.

9.2 Network measures and descriptives

  • networkDynamic::duration.matrix(g, changes, start, end) This function takes a given temporal network g, a matrix with columns “time”, “tail”, “head” (this matrix is called a toggle list), and a start and end time. It returns a data.frame a list of edges and activity spells. A toggle represents a switch from active state to inactive, or vice-versa.

  • network.size(g, onset = 5, length = 10). The size of a network during a specific time segment.

The following functions provide useful descriptives of durations in the temporal network.

  • tsna::edgeDuration(g, mode = "duration") or tsna::edgeDuration(g, mode = "counts") Sums the activity duration or number of edge events in a time segment.

  • tsna::vertexDuration(g, mode = "duration") or tsna::vertexDuration(g, mode = "counts") Sums the activity duration or number of vertex events in a time segment.

  • tsna::tiedDuration(g, mode = "duration") Measures the total amount of time each vertex has ties.

  • tsna::tiesDuration(g, mode = "counts") Computes the total number of edge spells each vertex is tied by.

The functions tsna::tEdgeFormation and tsna::tEdgeDissolution compute the number of edges forming or dissolving at time points over a time segment. If result.type = 'fraction' the fraction of the number of edges formed (or dissolved) is computed.

  • tsna::tEdgeFormation(g, start = 1, end = 4, time.interval = 1) Counts at times 1, 2, 3, and 4.

  • tsna::tEdgeDissolution(g, start = 1, end = 4, time.interval = 1) Counts at times 1, 2, 3, and 4.

9.2.1 Calculating measures from sna over time

You can calculate any measure from the sna package on a collapsed time segment or a series of collapsed time segments through the tsna::tSnaStats function. These measures can be vertex level statistics (e.g., sna:betweenness) or graph-level measures (e.g., sna::grecip). You specify which function you want to calculate and the time segments they should be calculated on. The function returns a time series, which makes the outcomes easy to plot.
For example, you want to calculate transitivity of intervals that are 5 time points wide. The following function calculates transitivity for time intervals [0-5), [5-10), [10-15), etc:

tsna::tSnaStats(g, snafun = "gtrans", time.interval = 5, aggregate.dur = 5)

This can cause some sudden shifts of values, so it is often more informative to use overlapping segments. So, let us calculate density for windows of width 0, at intervals of 3. This calculates density for intervals 0-10, 3-13, 6-16, et cetera:

tsna::tSnaStats(g, snafun = "gden", time.interval = 3, aggregate.dur = 10)

9.2.2 Calculating ergm terms over time

The tsna also allows you to compute ergm terms for specific time segments. Because the model terms provided by the ergm package (and its various add-ons) are ‘change statistics’ (that determine the effect of changing a single tie on the overall network structure), you can use these terms to describe the network within specific time segments. You specify which terms you want to calculate using a formula.

For example, tsna::tErgmStats(g,'~edges + degree(c(1, 2))', start = 3, end = 10) calculates the number of edges (edges) and the values for degree(1) and degree(2 for each specified time segment. The output is a time series (with a column for each statistic) and can simply be plotted using plot. This plots the time series for each term above the others, so you can see how all of them develop over time.

data(windsurfers, package = "networkDynamic")
plot(tsna::tErgmStats(windsurfers,'~edges + degree(2) + kstar(3)', 
                      aggregate.dur = 5), main = "ERGM terms over time")

In the lecture, we discussed participation shifts–also known as p-shifts. Gibson (2003) defined 13 P-shifts, and the tsna::pShiftCount function can count how often each type occurs in a specific time segment. This is how Gibson describes each of the thirteen types:

knitr::include_graphics("pshifts.png")

9.2.3 Participation shifts

  • tsna::pShiftCount(g, start = 1, end = 3) Calculates the number of times each of the above P-shifts occurred during the specified time segment. In other words, this calculates the P-shift census.

9.2.4 Temporal paths

The tsna::tPath function calculates the set of temporally reachable vertices from a given source vertex starting at a specific time.

  • tsna::tPath(g, v = 12, direction = "fwd", start = 0, end = 3) This calculates the temporal paths from vertex 12 to all other vertices, from the start of the specified time segment. When direction = "bkwd", it determines the paths to vertex 12. You can further specify whether you will find the paths that arrive the first or the ones that leave the vertex at the latest possible times.

The generally most relevant parts of the resulting object are:

  • tdist The time each specific path takes. When a path does not exist, the value if Inf.

  • gsteps The length of the path (in terms of the number of steps). When a path does not exist, the value if Inf.

The tsna::plotPaths plots the network and highlights the calculated temporal paths from the chosen vertex (vertex 12, in the example above). It can also add a label to each edge, so you can see how much time it takes for that edge to be activated from this focal vertex. You can tweak the plot like you would tweak any network plot of class network.

tsna::plotPaths(
  g,
  paths = tsna::tPath(g, v = 12, direction = "fwd", start = 0, end = 3),
  displaylabels = FALSE,        # remove the vertex labels, to prevent too much visual clutter
  vertex.col = "white", 
  edge.label.cex = 1.5          # the color of the printed times
)

A related concept is that of “temporal reachability.” The tsna::tReach function computes, for each vertex, the number of vertices that are temporally reachable over the entire observation period.
If you want to compute this for a specific time segment, first use networkDynamic::network.extract to extract the segment of interest and then feed this to the tsna::tReach function.

  • tsna::tReach(g, direction = "fwd", start = 10, end = 20) The function to calculate the temporal reachable sets using only temporally forward steps (you can also specify direction = "bkwd" to determine by how many vertices each vertex can be temporally reached).

9.3 Network visualization

Temporal networks can be visualized in two ways. First, static plots can be made of a temporal network, either by collapsing the temporal network into a static network (or to break up the temporal network into static networks of specific time segments).

9.3.1 Visualizing as static networks

An obvious way to visualize the entire temporal network as a static network is to simply use plot(g).

Alternatively, the temporal network can be collapsed into smaller time segments and plot these the network slices as static representations.

There are two functions that can do this. The ndtv package has the ndtv::filmstrip that does this as follows:

  • ndtv::filmstrip(g, frames = 9) This plots the network at 9 points in time. It does not provide an overview of how the network changes over time, but it provides a series of snapshots (9, in this example) of the network. If the timing of the edges is in continuous time, this function has the tendency to plot nearly empty graphs, as it evaluates the networks at specific time points, rather than time intervals.

The SNA4DS package implements a function that divides the specified time period into time segments of equal time length and plots each segment as a static network. This is useful to see how the network changes over time. It also works nicely for networks where changes happen in continuous time.

SNA4DS::plot_network_slices(9, number = 93)

A sometimes useful function is ndtv::proximity.timeline, which shows the distance between the edges over time. The main purpose is to see how the edges move vis-a-vis each other over time (based on the geodesic path distance) and it often helps to see where and when subgroups are forming over time.

The function call is:

ndtv::proximity.timeline(g,  start = 10, end = 50, 
                         time.increment = .5,
                         mode = 'isoMDS')

where you can change the mode to a different scaling algorithm. For actual research projects, you want to try various settings and check which gives you the most informative output for the data at hand.

The function allows you to set many arguments (such as labels and colors).

9.3.2 Visualizing as a dynamic network animation

The ndtv package includes functions to create an animation of how the network unfolds over time. There are many arguments you can tweak, so here we only focus on the main approach. Make sure to consult the package help for more details.

There are two steps in creating a dynamic visualization in ndtv: you first run ndtv::compute.animation, which determines coordinates and other aspects of the dynamic plot. Second, you run ndtv::render.d3movie, which, you guessed it, renders the actual movie.

# step 0: unfortunately, we have to load the package into our session
library(ndtv)

# step 1: compute the settings
ndtv::compute.animation(g, animation.mode = "kamadakawai",
                        slice.par = list(start = 0, end = 45,
                                         aggregate.dur = 1,
                                         interval = 1, rule = "any"))

# step 2, render the animation
ndtv::render.d3movie(g, usearrows = TRUE, displaylabels = FALSE ,
                     bg = "#111111",
                     edge.col = "#55555599",
                     render.par = list(tween.frames = 15, 
                                       show.time = TRUE),
                     d3.options = list(animationDuration = 1000,
                                       playControls = TRUE,
                                       durationControl = TRUE),
                     output.mode = 'htmlWidget'
                     )

Some important arguments for ndtv::render.d3movie include:

  • launchBrowser: defaults to TRUE: determines whether the animation will be shown in the Browser after rendering.
  • output.mode: the kind of output you want (defaults to ‘HTML’)
  • filename: The file name of the HTML or JSON file to be generated. Only relevant if you picked ‘HTML’ or ‘JSON’ as output.mode.

Further, you can set most of the common graphical parameters, such as vertex.col, label.cex, use.arrows, edge.lwd, et cetera.

If you want to fix vertices to the same location throughout the animation, you do this as follows

# use some way to determine a matrix of vertex locations
coords <- ndtv::network.layout.animate.kamadakawai(g)

# add the x and y coordinates as vertex attributes
# adapt onset and terminus if required
networkDynamic::activate.vertex.attribute(g, "x", coords[, 1], 
                                          onset = -Inf, terminus = Inf)
networkDynamic::activate.vertex.attribute(g, "y", coords[, 2], 
                                          onset = -Inf, terminus = Inf)

# compute the new animation settings
# We now use `animation.mode = "useAttribute"`
ndtv::compute.animation(g, animation.mode = "useAttribute",
                        slice.par = list(start = 0, end = 45,
                                         aggregate.dur = 1,
                                         interval = 1, rule = "any"))